28 research outputs found
Parsimonious Labeling
We propose a new family of discrete energy minimization problems, which we
call parsimonious labeling. Specifically, our energy functional consists of
unary potentials and high-order clique potentials. While the unary potentials
are arbitrary, the clique potentials are proportional to the {\em diversity} of
set of the unique labels assigned to the clique. Intuitively, our energy
functional encourages the labeling to be parsimonious, that is, use as few
labels as possible. This in turn allows us to capture useful cues for important
computer vision applications such as stereo correspondence and image denoising.
Furthermore, we propose an efficient graph-cuts based algorithm for the
parsimonious labeling problem that provides strong theoretical guarantees on
the quality of the solution. Our algorithm consists of three steps. First, we
approximate a given diversity using a mixture of a novel hierarchical
Potts model. Second, we use a divide-and-conquer approach for each mixture
component, where each subproblem is solved using an effficient
-expansion algorithm. This provides us with a small number of putative
labelings, one for each mixture component. Third, we choose the best putative
labeling in terms of the energy value. Using both sythetic and standard real
datasets, we show that our algorithm significantly outperforms other graph-cuts
based approaches
Discovering Class-Specific Pixels for Weakly-Supervised Semantic Segmentation
We propose an approach to discover class-specific pixels for the
weakly-supervised semantic segmentation task. We show that properly combining
saliency and attention maps allows us to obtain reliable cues capable of
significantly boosting the performance. First, we propose a simple yet powerful
hierarchical approach to discover the class-agnostic salient regions, obtained
using a salient object detector, which otherwise would be ignored. Second, we
use fully convolutional attention maps to reliably localize the class-specific
regions in a given image. We combine these two cues to discover class-specific
pixels which are then used as an approximate ground truth for training a CNN.
While solving the weakly supervised semantic segmentation task, we ensure that
the image-level classification task is also solved in order to enforce the CNN
to assign at least one pixel to each object present in the image.
Experimentally, on the PASCAL VOC12 val and test sets, we obtain the mIoU of
60.8% and 61.9%, achieving the performance gains of 5.1% and 5.2% compared to
the published state-of-the-art results. The code is made publicly available
MoCaE: Mixture of Calibrated Experts Significantly Improves Object Detection
We propose an extremely simple and highly effective approach to faithfully
combine different object detectors to obtain a Mixture of Experts (MoE) that
has a superior accuracy to the individual experts in the mixture. We find that
naively combining these experts in a similar way to the well-known Deep
Ensembles (DEs), does not result in an effective MoE. We identify the
incompatibility between the confidence score distribution of different
detectors to be the primary reason for such failure cases. Therefore, to
construct the MoE, our proposal is to first calibrate each individual detector
against a target calibration function. Then, filter and refine all the
predictions from different detectors in the mixture. We term this approach as
MoCaE and demonstrate its effectiveness through extensive experiments on object
detection, instance segmentation and rotated object detection tasks.
Specifically, MoCaE improves (i) three strong object detectors on COCO test-dev
by by reaching ; (ii) instance
segmentation methods on the challenging long-tailed LVIS dataset by
; and (iii) all existing rotated object detectors by reaching
on DOTA dataset, establishing a new state-of-the-art
(SOTA). Code will be made public
Stable Rank Normalization for Improved Generalization in Neural Networks and GANs
Exciting new work on the generalization bounds for neural networks (NN) given
by Neyshabur et al. , Bartlett et al. closely depend on two
parameter-depenedent quantities: the Lipschitz constant upper-bound and the
stable rank (a softer version of the rank operator). This leads to an
interesting question of whether controlling these quantities might improve the
generalization behaviour of NNs. To this end, we propose stable rank
normalization (SRN), a novel, optimal, and computationally efficient
weight-normalization scheme which minimizes the stable rank of a linear
operator. Surprisingly we find that SRN, inspite of being non-convex problem,
can be shown to have a unique optimal solution. Moreover, we show that SRN
allows control of the data-dependent empirical Lipschitz constant, which in
contrast to the Lipschitz upper-bound, reflects the true behaviour of a model
on a given dataset. We provide thorough analyses to show that SRN, when applied
to the linear layers of a NN for classification, provides striking
improvements-11.3% on the generalization gap compared to the standard NN along
with significant reduction in memorization. When applied to the discriminator
of GANs (called SRN-GAN) it improves Inception, FID, and Neural divergence
scores on the CIFAR 10/100 and CelebA datasets, while learning mappings with
low empirical Lipschitz constants.Comment: Accepted at the International Conference in Learning Representations,
2020, Addis Ababa, Ethiopi
Rounding-based Moves for Semi-Metric Labeling
International audienceSemi-metric labeling is a special case of energy minimization for pairwise Markov random fields. The energy function consists of arbitrary unary potentials, and pairwise potentials that are proportional to a given semi-metric distance function over the label set. Popular methods for solving semi-metric labeling include (i) move-making algorithms, which iteratively solve a minimum st-cut problem; and (ii) the linear programming (LP) relaxation based approach. In order to convert the fractional solution of the LP relaxation to an integer solution, several randomized rounding procedures have been developed in the literature. We consider a large class of parallel rounding procedures, and design move-making algorithms that closely mimic them. We prove that the multiplicative bound of a move-making algorithm exactly matches the approximation factor of the corresponding rounding procedure for any arbitrary distance function. Our analysis includes all known results for move-making algorithms as special cases
Continual Learning in Low-rank Orthogonal Subspaces
In continual learning (CL), a learner is faced with a sequence of tasks,
arriving one after the other, and the goal is to remember all the tasks once
the continual learning experience is finished. The prior art in CL uses
episodic memory, parameter regularization or extensible network structures to
reduce interference among tasks, but in the end, all the approaches learn
different tasks in a joint vector space. We believe this invariably leads to
interference among different tasks. We propose to learn tasks in different
(low-rank) vector subspaces that are kept orthogonal to each other in order to
minimize interference. Further, to keep the gradients of different tasks coming
from these subspaces orthogonal to each other, we learn isometric mappings by
posing network training as an optimization problem over the Stiefel manifold.
To the best of our understanding, we report, for the first time, strong results
over experience-replay baseline with and without memory on standard
classification benchmarks in continual learning. The code is made publicly
available.Comment: The paper is accepted at NeurIPS'2